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Abstract 

In this paper, we investigate a problem concerning quartets, which are a partic- 
ular type of tree on four leaves. Loosely speaking, a set of quartets is said to be 
'definitive' if it completely encapsulates the structure of some larger tree, and 
'minimal' if it contains no redundant information. Here, we address the question 
of how large a minimal definitive quartet set on n leaves can be, showing that 
the maximum size is at least 2n — 8 for all n > 4. This is an enjoyable problem 
to work on, and we present a pretty construction, which employs symmetry. 
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1. Introduction 

The motivation for this paper comes from the field of phylogenetics, which 
involves the study of the 'tree of life' depicting all living things, as popularised 
by Charles Darwin. In such a representation, existing species are drawn as 
leaves of the tree, while their ancestors are shown as interior vertices. 

In practice, the overall evolutionary (or 'phylogenetic') tree is built up by 
piecing together various smaller items of information. For example, if species u 
and V both have wings and species w and x do not, then it is likely that u and 
V have a common ancestor that is not shared by w and x, and so the path from 
It to u on the tree of life should not intersect the path from w to x. 

The objective of this paper is to present a new result on quartets, which are 
a type of graph often used when reconstructing evolutionary trees in this way. 
We start by providing some necessary definitions. 

A phylogenetic tree is a tree with no vertices of degree 2 in which the leaves 
are labelled (distinctly) and the interior vertices are not. A phylogenetic tree 
is called binary if all interior vertices have degree exactly 3, and a quartet is 
defined to be a binary phylogenetic tree with precisely four leaves (note that 
such a tree is unique up to labelling). We use the notation uv\wx to denote a 
quartet that is labelled as in Figure [TJ 

We say that a phylogenetic tree T displays the quartet uv\'wx if u, v, w and 
X are all leaves in T and the path from m to w does not intersect the path from 
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Figure 1: The quartet uv\wx. 

w to X (or, equivalently, there exists a cut-edge in T which separates u and v 
from w and x). We say that T displays a set of quartets Q if T displays each 
individual quartet q € Q. 

For a set of quartets Q with total leaf-set L{Q), we say that Q defines 
a tree T (or that Q is definitive for T) if T is the unique phylogenetic tree 
with leaf-set L{Q) that displays Q. Note that many quartet sets will not 
define any tree, either because they contain quartets that are incompatible 
(e.g. {uvlwXjUwlvx}) or because they are not informative enough to be partic- 
ular to one tree (e.g. {uv\wx,uv\wy} is displayed by four different phylogenetic 
trees with leaf-set {u, v, w, x, y}, as shown in Figure[2]). An example of a quartet 
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Figure 2: Four different phylogenetic trees displaying the quartets uv\wx and uv\wy. 

set that is definitive is {uv\wx, ux\wy}, which can be seen to define the left-most 
tree in Figure [5J Finally, we say that Q is a minimal definitive quartet set (or 
that Q is minimally definitive) if Q defines some tree T but, for all q G Q, Q ~ q 
does not define T (for example, {uv\wx, uxlwy} is minimally definitive, but not 
{uv\wx, ux\wy, uv\wy}). 

It is fairly straightforward to see that if Q defines T, then T must be binary 
and Q must distinguish every interior edge of T (a quartet uv\wx is said to 
distinguish an edge e G T if e is the unique cut-edge in T that separates u and 
V from w and x). Thus, since a binary tree on n leaves always has exactly n — 3 
interior edges, it follows that every definitive quartet set on {1,2, ... ,n} must 
contain at least n — 3 quartets. Furthermore, it is known that for any binary 
phylogenetic tree T on n leaves, there is indeed a set of n — 3 quartets that does 
define T (see, for example, 0] Corollary 6.3.10). 

Hence, the remaining interest in minimal definitive quartet sets lies in the 
question of how large they can be. Examples have been produced that have size 
greater than n — 3, but until recently it was thought that the maximum possible 
size would be bounded by n -I- c for some fixed constant c. However, Humphries 



2 



Theorem 3.4.1) then proved that there actually exist examples with size at 
least |n — 6, for all n > 4. In this paper, we will improve matters still further 
by constructing minimal definitive quartet sets of size 2n — 8, for all n > 5. 

2. Main Section 

This section will culminate in the inductive construction of minimal definitive 
quartet sets of size 2n — 8. The structure of the section will be as follows: we 
shall start by stating three lemmas that will be useful to us; we shall then prove 
the result for n = 6, which will be the base case for our induction; we shall then 
also prove the n = 7 case, as a way to convey the ideas of the inductive step; 
and finally we shall prove the full result. 

We start by making explicit a result that we have already noted: 

Lemma 1 Proposition 6.8.4). Let Q be a set of quartets that defines 
a tree T . Then each interior edge of T must be distinguished by at least one 
quartet in Q. 

The converse of Lemma [T] is known not to be true in general. However, the 
following result, which will play an extremely important role in our construction, 
comes close: 

Lemma 2 ([2*], Theorem 6.8.8). Let Q be a set of quartets containing a com- 
mon leaf and let T be a tree displaying Q for which each interior edge is dis- 
tinguished by at least one quartet in Q. Then Q defines T. 

Often, a set of quartets Q can be used to 'infer' a further quartet q, in 
the sense that every phylogenetic tree displaying Q must also display q. The 
notation Q \- q is used to denote such inferences. Numerous examples are 
known, but we shall only use one very simple one: 

Lemma 3. {ab\de, bc\de} h ac\de. 

As well as the three lemmas that we have stated, caterpillar trees will also 
play a major role in our proofs. The caterpillar tree on i leaves, which we shall 
denote by Ti , is defined via Figure |3l 
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Figure 3: The caterpillar tree Ti. 

We shall now give an example of a minimal definitive quartet set on six 
leaves that has size four, thus fulfilling our 2n — 8 target. Such sets have already 
been produced before now, but ours has a nice reversible symmetry to it that 
will later prove significant. 
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Lemma 4. The set of quartets 

Qe = {12|35, 13|46, 12|56, 24|56} 
is minimally definitive for the caterpillar tree Tg . 

Proof Let us first show that Qg is definitive. Note that we have {12|56, 24|56} h 
14|56, by Lemma H and hence Qe h {12|35, 13|46, 14|56}. Since the three 
quartets in this subset all contain the common leaf 1 and collectively distinguish 
each interior edge of Tg, definitiveness then follows automatically from Lemma[2j 
It remains to show that Qg is minimally definitive. If not, then there exists 
a quartet q € Qe such that Qq — q defines Tg. By Lemma [Tl the only possibility 
for q is 12|56. However, the tree T' shown in Figure [3] displays Qe — 12|56, and 




6 1 

Figure 4: A tree T' displaying the quartet set Qg — 12|56. 

T' is certainly distinct from Tg. Hence, it follows that Qg is indeed minimally 
definitive. □ 

We shall now see how to use the set Qq from Lemma [4] to produce a minimal 
definitive quartet set on seven leaves that has size six. This example, combined 
with the paragraph of discussion after the proof, is intended to help make clear 
the general strategy, which utilises the symmetry properties that we have ob- 
served, but the reader is free to proceed straight to the full proof of TheoremlH] 
if he so wishes. 

Example 5. The set of quartets 

Qy = {12|35, 13|46, 12|57, 24|57, 13|67, 35|67} 
is minimally definitive for the caterpillar tree Ty. 

Proof As a rigorous proof is implicitly included within that to Theorem [51 we 
shall provide a slightly more informal treatment here. Firstly, note that Table[T] 
shows that we can again use Lemma [2] to prove definitiveness (it is worth ob- 
serving the way that the quartets are paired up here) . By Lemma (TJ it then 
only remains to provide suitable trees T" and T'" displaying Qy — 12|57 and 
Q-j — 13|67, respectively. But note that T" (see Figure [S]) can easily be formed 
from the tree T' in Figure 0] (it is important to note the similarity between Qg 
and Qy), while the symmetry of Qy allows us to take T'" to be the same as T", 
but with the numbers reversed! □ 
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Table 1: Some inferences that can be made from the quartet set Q-j. 

T" ,5 6 , T" 
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Figure 5: Trees T" and T'" displaying the quartets Q^ — 12|57 and Q7 — 13|67, respectively. 



As we shall now see, the inductive step in the full proof follows exactly the 
same procedure as in the example above. We shall always use caterpillars, and 
we shall show definitiveness by always adding an extra pair of quartets that 
together infer l{n — 2)|(n — l)n. Proving minimality will then come down to 
constructing various trees of the form Q — q. All but one of these will be formed 
from the trees of the previous stage of the induction, while the additional tree 
will be created by reversing numbers. 

Theorem 6. Let n > 5 be some positive integer. Then there exists a minimal 
definitive quartet set on n leaves that has size 2n — 8. 

Proof We have already noted the result for n G {5,6} (and also for n = 7, for 
those that have read Example [5]) , so we shall now proceed inductively, using 
the set Qe defined in Lemma |3] as our base. Let us use gg.i) 96.2, 96,3 and 
96,4, respectively, to denote the quartets 12|35, 13|46, 12|56 and 24|56, in that 
order (so Qe — {(?6,i, 96,2, 96,3, 96,4})- For k > 7, let us then define Qk = 
{9fe,i, 9fc,2, • ■ • , 9fc,2fe-8} recursively from Qk-i as follows: (i) for all i < 2k — 12, 
set qk^i = qk-i,i', (ii) for i G {2k — 11, 2k — 10}, set qk^i to be the same as 9/c-i,i, 
but with occurrences of fc— 1 replaced by fc; and (iii) set qk,2k-9 = l(fc— 4)|(fc— l)fc 
and 9fe.2fe-8 — {k — 4)(fc — 2)\{k — l)k. It can be checked that this procedure 
produces the set Q^ defined in Example [5j 

Note that \Qk\ = 2k — 8 and so, by induction, it now suffices to prove that Qk 
is minimally definitive for the caterpillar tree Tk given that Qk-i is minimally 
definitive for Tk-i- This is precisely what we shall now do. 

First, let us check that Tk displays Qk- Note that Tk displays all quartets 
uv\wx ioT u < V < w < X, and qk.2k-9 and qk,2k-& are certainly of this form. 
By induction, we can see that all other quartets in Qk also satisfy this property, 
and so Tk does indeed display Qk- 

Next, we shall show that Qk is definitive for Tk, using the same argument as 
for when fc = 6. By induction, we may assume that {9^,1, 9fc, 2, • ■ • ,9fc, 2^-12} l~ 
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{12|35, 13|46, 14|57, . . . , l(fc-4)|(A:-3)(fc-l)}. Note qk^2k~ii = l(fc-5)|(fc-2)A: 
and qk,2k-w = (fc-5)(fc-3)|(fc-2)fc, so {(7fc,2fe-ii, '?fc,2fc-io} ^ l{k - i)\{k - 2)k 
by Lemma [3] (and so the induction does hold). Finally, Lemma |3] also implies 
that {?fe,2fe-9, 9fe,2fc-8} 1^ ^ 2)|(fc — l)/c. Hence, just as with the case when 
fc = 6, we may use Lemma [D to deduce that Qfc is definitive for Tk- 

It now only remains for us to show that Qk is minimally definitive. To do 
this, we need to show that for each qk^i there exists a tree Tk^i ^ Tk that displays 
Qk — qk,i- 

For i < 2k — 10, we can take Tk^i to be the tree formed fi-om Tk-i.i by 
replacing vertex fc — 1 with a 'cherry' {fc — l,fc} (by which we mean a rooted 
binary tree with leaf-set {fc — l,fc} — for example, the tree T" in Figure [5] 
is formed from the tree T' in Figure |4] by replacing vertex 6 with the cherry 
{6,7}). It is clear that Tk^i ^ Tk, since Tk-i^i ^ Tk-i- The proof that Tk^i 
displays Qk — qk,i follows from observing that (a) Tk^i displays all quartets that 
are displayed by Tk-i^i, since Tk-i^i is a subgraph of T^^j, (b) for w < fc — 1, 
Tk^i displays the quartet uv\wk if it displays uv\w(k — 1) (since {fc — 1, fc} forms 
a cherry), and hence Tk^i displays uv\wk if Tk^i^i displays uv\w(k — 1), and 
(c) Tk^i displays all quartets of the form Mu|(fc — l)fc, again using the fact that 
{fc — 1, fc} forms a cherry. 

For i = 2fc — 8, we may appeal to Lemma [TJ and so this only leaves the case 
i = 2fc — 9, for which we have the quartet qk,2k-9 — ^{k — 4)|(fc — l)fc. To deal 
with this, let us take Tk^2k-9 to be the tree formed from Tk,3 (which displays 
Qk — 12|57) by 'reversing' all the numbers, i.e. 1 becomes fc, 2 becomes fc — 1, 
3 becomes fc — 2, and so on. Note that Tk,2k~9 7^ Tk, since Tk is the 'reverse' of 
itself, and so it only remains to show that Tk,2k-9 displays Qk — qk,2k~9, which 
we shall now do. 

First, note that the tree Tk.3 was formed by taking a tree displaying Qq — 
12|56, replacing vertex 6 with a cherry {6,7}, then replacing vertex 7 with a 
cherry {7, 8}, replacing vertex 8 with a cherry {8, 9}, and so on until replacing 
vertex fc — 1 with a cherry {fc — 1, fc}. Hence, Tk.3 niust display every quartet 
of the form uv\wx for u < v < w < x and w > 6, and so Tk.2k-9 must display 
every quartet of the form ab\cd for a < b < c < d and b < k — 5. 

This immediately covers every quartet in Qk — qk,2k-9 apart from three: 
%,2fc-i2 = (fc - 6)(fc - 4)|(fc - 3)(fc - 1), qk,2k-w = (fc - 5)(fc - 3)|(fc - 2)fc and 
gfe,2fc-8 — {k — 4)(fc — 2)|(fc — l)fc. Furthermore, since these are the 'opposites' 
of qkA — 24|57, qk,2 = 13|46 and qk.i = 12|35, which are all displayed by Tk.3, 
we find that these three remaining quartets are also all displayed by Tk.2k-9- 
Hence, Tk^2k-9 displays Qfc ~ 9fc,2fc-9, and so we are done. □ 

3. Questions 

The obvious question is whether 2n — 8 can be improved upon, and it would 
be interesting to know of any better examples. Throughout this paper, we 
have only used caterpillar trees, partly for simplicity, and so another question 
of interest would be to ask whether caterpillars can always be relied upon to 
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provide the extremal cases. Finally, it would also be nice to obtain some sort of 
upper bound on the maximum possible size of a minimal definitive quartet set, 
other than the trivial (J). 
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